{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "BipedalWalker_Example.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/AI4Finance-LLC/ElegantRL/blob/master/BipedalWalker_Example.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "c1gUG3OCJ5GS"
      },
      "source": [
        "# **BipedalWalker-v3 Example in ElegantRL**\n",
        "\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FGXyBBvL0dR2"
      },
      "source": [
        "# **Part 1: Testing Task Description**\n",
        "\n",
        "[BipedalWalker-v3](https://gym.openai.com/envs/BipedalWalker-v2/) is a classic task in robotics since it performs one of the most fundamental skills: moving. In this task, our goal is to make a 2D biped walker to walk through rough terrain. BipedalWalker is a difficult task in continuous action space, and there are only a few RL implementations can reach the target reward."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 354
        },
        "id": "-HUVckiDVPhN",
        "outputId": "ea2edb57-2066-4206-fbe0-fb20525efda8"
      },
      "source": [
        "from IPython.display import HTML\n",
        "HTML(f\"\"\"<video src={\"https://gym.openai.com/videos/2019-10-21--mqt8Qj1mwo/BipedalWalker-v2/original.mp4\"} width=500 controls/>\"\"\") # the random demonstration of the task from OpenAI Gym"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<video src=https://gym.openai.com/videos/2019-10-21--mqt8Qj1mwo/BipedalWalker-v2/original.mp4 width=500 controls/>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 1
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DbamGVHC3AeW"
      },
      "source": [
        "# **Part 2: Install ElegantRL**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "U35bhkUqOqbS",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "71f0c3a5-ecfc-4a44-f8e7-af491c1d1358"
      },
      "source": [
        "# install elegantrl library\n",
        "!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "Collecting git+https://github.com/AI4Finance-LLC/ElegantRL.git\n",
            "  Cloning https://github.com/AI4Finance-LLC/ElegantRL.git to /tmp/pip-req-build-yykg2fyl\n",
            "  Running command git clone -q https://github.com/AI4Finance-LLC/ElegantRL.git /tmp/pip-req-build-yykg2fyl\n",
            "Requirement already satisfied: gym in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (0.17.3)\n",
            "Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (3.2.2)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (1.19.5)\n",
            "Collecting pybullet\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/e6/9c/7b76db10cdaa69c840b211fe21ce6f31fb80b611b198fe18a64ddb8f374e/pybullet-3.1.0-cp37-cp37m-manylinux1_x86_64.whl (88.7MB)\n",
            "\u001b[K     |████████████████████████████████| 88.7MB 65kB/s \n",
            "\u001b[?25hRequirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (1.8.0+cu101)\n",
            "Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/dist-packages (from elegantrl==0.3.1) (4.1.2.30)\n",
            "Collecting box2d-py\n",
            "\u001b[?25l  Downloading https://files.pythonhosted.org/packages/87/34/da5393985c3ff9a76351df6127c275dcb5749ae0abbe8d5210f06d97405d/box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448kB)\n",
            "\u001b[K     |████████████████████████████████| 450kB 43.1MB/s \n",
            "\u001b[?25hRequirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.1) (1.4.1)\n",
            "Requirement already satisfied: pyglet<=1.5.0,>=1.4.0 in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.1) (1.5.0)\n",
            "Requirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /usr/local/lib/python3.7/dist-packages (from gym->elegantrl==0.3.1) (1.3.0)\n",
            "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (2.4.7)\n",
            "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (2.8.1)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (1.3.1)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->elegantrl==0.3.1) (0.10.0)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->elegantrl==0.3.1) (3.7.4.3)\n",
            "Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from pyglet<=1.5.0,>=1.4.0->gym->elegantrl==0.3.1) (0.16.0)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib->elegantrl==0.3.1) (1.15.0)\n",
            "Building wheels for collected packages: elegantrl\n",
            "  Building wheel for elegantrl (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for elegantrl: filename=elegantrl-0.3.1-cp37-none-any.whl size=35699 sha256=a9b4488eda0a02d23ae8cc3a8377ae2e2bafbdf830088f68286ef4cab994fbae\n",
            "  Stored in directory: /tmp/pip-ephem-wheel-cache-q4cash6l/wheels/d0/f4/2e/cec0c14b57c2094a2bcef3063f95d758ad1309a640ff100419\n",
            "Successfully built elegantrl\n",
            "Installing collected packages: pybullet, box2d-py, elegantrl\n",
            "Successfully installed box2d-py-2.3.8 elegantrl-0.3.1 pybullet-3.1.0\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UVdmpnK_3Zcn"
      },
      "source": [
        "# **Part 3: Import Packages**\n",
        "\n",
        "\n",
        "*   **elegantrl**\n",
        "*   **OpenAI Gym**: a toolkit for developing and comparing reinforcement learning algorithms.\n",
        "*   **PyBullet Gym**: an open-source implementation of the OpenAI Gym MuJoCo environments.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1VM1xKujoz-6"
      },
      "source": [
        "from elegantrl.run import *\n",
        "from elegantrl.agent import AgentPPO\n",
        "from elegantrl.env import PreprocessEnv\n",
        "import gym\n",
        "gym.logger.set_level(40) # Block warning"
      ],
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3n8zcgcn14uq"
      },
      "source": [
        "# **Part 4: Specify Agent and Environment**\n",
        "\n",
        "*   **args.agent**: firstly chooses one DRL algorithm to use, and the user is able to choose any agent from agent.py\n",
        "*   **args.env**: creates and preprocesses the environment, and the user can either customize own environment or preprocess environments from OpenAI Gym and PyBullet Gym from env.py.\n",
        "\n",
        "\n",
        "> Before finishing initialization of **args**, please see Arguments() in run.py for more details about adjustable hyper-parameters.\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "E03f6cTeajK4",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "aba2d931-4805-4d26-bcea-439bc27e7e69"
      },
      "source": [
        "args = Arguments(if_on_policy=False)\n",
        "args.agent = AgentPPO()  # AgentSAC(), AgentTD3(), AgentDDPG()\n",
        "args.env = PreprocessEnv(env=gym.make('BipedalWalker-v3'))\n",
        "args.reward_scale = 2 ** -1  # RewardRange: -200 < -150 < 300 < 334\n",
        "args.gamma = 0.95\n",
        "args.rollout_num = 2 # the number of rollout workers (larger is not always faster)"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "\n",
            "| env_name:  BipedalWalker-v3, action space if_discrete: False\n",
            "| state_dim:   24, action_dim: 4, action_max: 1.0\n",
            "| max_step:  1600, target_reward: 300\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z1j5kLHF2dhJ"
      },
      "source": [
        "# **Part 5: Train and Evaluate the Agent**\n",
        "\n",
        "> The training and evaluating processes are all finished inside function **train_and_evaluate_mp()**, and the only parameter for it is **args**. It includes the fundamental objects in DRL:\n",
        "\n",
        "*   agent,\n",
        "*   environment.\n",
        "\n",
        "> And it also includes the parameters for training-control:\n",
        "\n",
        "*   batch_size,\n",
        "*   target_step,\n",
        "*   reward_scale,\n",
        "*   gamma, etc.\n",
        "\n",
        "> The parameters for evaluation-control:\n",
        "\n",
        "*   break_step,\n",
        "*   random_seed, etc.\n",
        "\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KGOPSD6da23k",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "431aa2cb-c802-42d0-e892-d4f2a55b0cd7"
      },
      "source": [
        "train_and_evaluate_mp(args) # the training process will terminate once it reaches the target reward."
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "| multiprocessing, act_workers: 2\n",
            "| multiprocessing, None:\n",
            "| GPU id: 0, cwd: ./AgentPPO/BipedalWalker-v3_0\n",
            "| Remove history\n",
            "ID      Step      MaxR |    avgR      stdR       objA      objC\n",
            "0   0.00e+00    -92.10 |\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JPXOxLSqh5cP"
      },
      "source": [
        "Understanding the above results::\n",
        "*   **Step**: the total training steps.\n",
        "*  **MaxR**: the maximum reward.\n",
        "*   **avgR**: the average of the rewards.\n",
        "*   **stdR**: the standard deviation of the rewards.\n",
        "*   **objA**: the objective function value of Actor Network (Policy Network).\n",
        "*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network)."
      ]
    }
  ]
}
